Reduced Representations for Efficient Analysis of Genomic Data; from Microarray to High-throughput Sequencing
نویسندگان
چکیده
OF THE DISSERTATION Reduced Representations for Efficient Analysis of Genomic Data; From Microarray to High-throughput Sequencing by Md Pavel Mahmud Dissertation Director: Prof. Alexander Schliep Since the genomics era has started in the ’70s, microarray technologies have been extensively used for biological applications such as gene expression profiling, copy number variation (CNV) or Single Neucleotide Polymorphism (SNP) detection. To analyze microarray data, numerous statistical and algorithmic techniques have been developed over the last two decades; specially, for detecting CNV from array comparative genomic hybridization (arrayCGH) data, Hidden Markov Models (HMMs) have been successfully used. Still, due to computational reasons, the benefits of using Bayesian HMMs have been overlooked, and their use has been, at best, minimal in practice. The large demand for computational resources has also affected the analysis of high throughput sequencing (HTS) data, which, over the last few years, has started to revolutionize the field of computational biology. For example, the most sensitive tools for mapping HTS data to reference genomes are generally ignored in favor of fast, less accurate ones. In this dissertation, we strive for reduced representations of biological data which enable us to perform efficient computations on large datasets. Since biological datasets often contain repetitive, sometimes redundant, elements, it is a natural idea to identify groups of similar elements and directly perform computations on these groups. Usually,
منابع مشابه
I-37: Establishing High Resolution Genomic Profiles of Single Cells Using Microarray and Next-Generation Sequencing Technologies
The nature and pace of genome mutation is largely unknown. Standard methods to investigate DNA-mutation rely on arraying or sequencing DNA from a population of cells, hence the genetic composition of individual cells is lost and de novo mutation in cell(s) is concealed within the bulk signal. We developed methods based on (SNP-) arraying and next-generation sequencing of single-cell whole-genom...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملGenomicTools: a computational platform for developing high-throughput analytics in genomics
MOTIVATION Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. RESULTS We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and...
متن کاملKey challenges for next - generation pharmacogenomics
T he “post-genomic revolution” has advanced our understanding of the molecular etiology of a range of human genetic diseases, which might lead to improved disease prognosis and treatment. Over the past decade, genomics research has revealed the genomic variants underlying diseases, from single nucleotide variations to complex genome rearrangements, and/or altered gene expression patterns that l...
متن کاملEvaluation of Expressed Sequence Tag Clustering
Bioinformatics — the application of computer technology to the management of biological information — is essential to deciphering the genetic code of life. Novel approaches to genome sequencing, such as microarray technology, high-performance supercomputing and computational simulations in high-throughput DNA analysis have led to an explosion of genomic data available. Accurate genomic assembly...
متن کامل